New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: ssh: Support specifying identity file via environment variable #3149
Conversation
Doh, my attempt at testing is probably failing due to the connection caching. Will have to revisit. |
59e2a39
to
3a48742
Compare
Codecov Report
@@ Coverage Diff @@
## master #3149 +/- ##
==========================================
- Coverage 89.47% 44.68% -44.8%
==========================================
Files 248 248
Lines 32703 32717 +14
==========================================
- Hits 29262 14619 -14643
- Misses 3441 18098 +14657
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #3149 +/- ##
==========================================
- Coverage 90.71% 86.56% -4.15%
==========================================
Files 249 249
Lines 32723 32743 +20
==========================================
- Hits 29684 28345 -1339
- Misses 3039 4398 +1359
Continue to review full report at Codecov.
|
511acbc
to
3a48742
Compare
Hmph, with the latest iteration (3a48742), I'm stumped on why |
Could it be due to all already existing control paths we established in other tests? |
Right, that was along the lines of what I was thinking when I wrote "some sort of order/state dependence". The problem is that, even if the new test were using the same socket (it's not) and if it weren't closing it (it is), I don't see how There's Travis-specific setup for these tests, so I've been restricting my local debugging to |
88e7281
to
c89249d
Compare
OK, so it looks like the issue is that, after patching So I guess the main question is who should be responsible for calling I'm leaning towards making |
This test was written at a time when the socket path basename was simply the plain hostname. The basename has been a hash of various connection details since 5be560a (BF+ENH: Hash-based unique ControlPaths for SSH (fixes dataladgh-1243), 2017-01-31).
Upcoming commits will make it possible to configure an identity file to pass to ssh's -i. Consider the identity file when creating the connection hash because it's a defining feature of the connection.
The next commit will make this accessible to outside callers by adjusting SSHManager and exposing a configuration variable.
This enables callers to use sshrun with non-standard identity files. For example, in the ReproMan project, we want to use DataLad commands with an AWS EC2 instance. In this case, the host will not be in .ssh/config, and the key files are usually not in the default location. A plain environment variable would do for the above usecase, but add it as a variable in common_cfg.py for the visibility and documentation.
c89249d
to
93c8651
Compare
Done with latest push. range-diff
|
Eh after hooking up this latest approach to ReproMan, I'm not sure I made the right decision (I should have tested earlier). I was hoping that using So the options I see are
@yarikoptic, do you have a preference or see another, better option? |
Another option (currently my preference): Adjust |
"external" -- aren't you are using datalad Python API in reproman? shouldn't it then just change config directly ( |
Yes.
Regardless of what my specific code does, I was under the assumption that we wanted to support Python callers setting environment variables. That's the reason for the whole So I don't mind going in the direction you suggest, though I'd prefer the last option I suggested, which would accomplish the same thing without the need for SSHManager to have an |
whatever you like the best. I do not think though it is really bad to store |
I'm failing to see what the extra automation is. Either way you have a config.get call. By doing it in |
Regarding your memory about needing it in environment variable... We might indeed need that (possibly to set it if not yet set while "treating config variable") because if we have a datalad special remote initiated by some annex call, it might be needed. But I don't think we have a specific use case for that yet I think. |
This reverts and replaces the sshconnector.py changes from 93c8651 (ENH: ssh: Support configuring an identity file, 2019-02-07). 93c8651 updated assure_initialized() to get datalad.ssh.identityfile from its local ConfigManager instance and store that value as an attribute for later use. The motivation for retrieving the value in assure_initialized() was to avoid the expense of calling datalad.cfg.reload(force=True) with each get_connection() call. In turn, the motivation for using reload(force=True) was to prevent test_ssh_custom_identity_file's patched DATALAD_SSH_IDENTITYFILE from leaking into other tests [*]. The problem with this approach is that Python callers must set the identity file before the first assure_initialized() call---which could happen in SSHManager.get_connection() or _much earlier_ because GitRepo.__init__() also calls assure_initialized(). After the first assure_initialized() call, callers don't have a straightforward way of changing the setting. Instead let's use a plain datalad.cfg.get() call, without reloading, within get_connection(). This gives Python callers the ability to update the setting before _any_ get_connection() call, but they are now responsible for reloading datalad.cfg if needed. [*]: datalad#3149 (comment) Re: datalad#3149
## 0.11.3 (Feb 19, 2019) -- read-me-gently Just a few of important fixes and minor enhancements. ### Fixes - The logic for setting the maximum command line length now works around Python 3.4 returning an unreasonably high value for `SC_ARG_MAX` on Debian systems. ([#3165]) - DataLad commands that are conceptually "read-only", such as `datalad ls -L`, can fail when the caller lacks write permissions because git-annex tries merging remote git-annex branches to update information about availability. DataLad now disables `annex.merge-annex-branches` in some common "read-only" scenarios to avoid these failures. ([#3164]) ### Enhancements and new features - Accessing an "unbound" dataset method now automatically imports the necessary module rather than requiring an explicit import from the Python caller. For example, calling `Dataset.add` no longer needs to be preceded by `from datalad.distribution.add import Add` or an import of `datalad.api`. ([#3156]) - Configuring the new variable `datalad.ssh.identityfile` instructs DataLad to pass a value to the `-i` option of `ssh`. ([#3149]) ([#3168]) * tag '0.11.3': (35 commits) ENH: Finalized changelog entry for 0.11.3 [DATALAD RUNCMD] CHANGELOG: Re-linkify 0.11.3 entries CHANGELOG: Mention #3168 ENH: sshconnector: Get identity file with datalad.cfg.get() [DATALAD RUNCMD] CHANGELOG: Linkify 0.11.3 entries CHANGELOG: Add entries for 0.11.3 BF/RF(TST): skip test if actual sudo chown call fails BF(TMP): declare check_datasets_datalad_org failing on windows BF(TST): replace not relevant trailing .pull test with .repo_info BF(BK): for some reason an exception on repo_info invocation isn't raised while on travis TST: test_ro_operations via sudo (when possible) BF: allow to disallow git-annex branch merges for repo_info and use that in ls BF(unicode): use assure_unicode while formatting an exception BF(PY3): workaround for python3.4 on debian returning obnoxious SC_ARG_MAX BF: revert change for repo_info about not merging remote git-annex BF: set annex.merge-annex-branches=false for invocations not requiring updated remote information ENH: ssh: Support configuring an identity file ENH: SSHConnection: Support custom identity files ENH: ssh: Add identity_file parameter to get_connection_hash() TST+BF: sshconnector: Fix stale socket path construction ...
## 0.11.3 (Feb 19, 2019) -- read-me-gently Just a few of important fixes and minor enhancements. ### Fixes - The logic for setting the maximum command line length now works around Python 3.4 returning an unreasonably high value for `SC_ARG_MAX` on Debian systems. ([#3165]) - DataLad commands that are conceptually "read-only", such as `datalad ls -L`, can fail when the caller lacks write permissions because git-annex tries merging remote git-annex branches to update information about availability. DataLad now disables `annex.merge-annex-branches` in some common "read-only" scenarios to avoid these failures. ([#3164]) ### Enhancements and new features - Accessing an "unbound" dataset method now automatically imports the necessary module rather than requiring an explicit import from the Python caller. For example, calling `Dataset.add` no longer needs to be preceded by `from datalad.distribution.add import Add` or an import of `datalad.api`. ([#3156]) - Configuring the new variable `datalad.ssh.identityfile` instructs DataLad to pass a value to the `-i` option of `ssh`. ([#3149]) ([#3168]) * tag '0.11.3': BF(TST): skip if windows OR root; catch any exception
## 0.11.3 (Feb 19, 2019) -- read-me-gently Just a few of important fixes and minor enhancements. ### Fixes - The logic for setting the maximum command line length now works around Python 3.4 returning an unreasonably high value for `SC_ARG_MAX` on Debian systems. ([#3165]) - DataLad commands that are conceptually "read-only", such as `datalad ls -L`, can fail when the caller lacks write permissions because git-annex tries merging remote git-annex branches to update information about availability. DataLad now disables `annex.merge-annex-branches` in some common "read-only" scenarios to avoid these failures. ([#3164]) ### Enhancements and new features - Accessing an "unbound" dataset method now automatically imports the necessary module rather than requiring an explicit import from the Python caller. For example, calling `Dataset.add` no longer needs to be preceded by `from datalad.distribution.add import Add` or an import of `datalad.api`. ([#3156]) - Configuring the new variable `datalad.ssh.identityfile` instructs DataLad to pass a value to the `-i` option of `ssh`. ([#3149]) ([#3168]) * tag '0.11.3': Boost version to 0.11.3
This enables callers to use sshrun with non-standard identity files.
For example, in the ReproMan project, we want to use DataLad commands
with an AWS EC2 instance. In this case, the host will not be in
.ssh/config, and the key files are usually not in the default
location.